-
Notifications
You must be signed in to change notification settings - Fork 2.1k
fix: use max_completion_tokens for GPT-5 models in LiteLLM provider #6980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- GPT-5 models require max_completion_tokens instead of the deprecated max_tokens parameter - Added detection for GPT-5 model variants (gpt-5, gpt5, GPT-5, etc.) - Updated both createMessage and completePrompt methods to handle GPT-5 models - Added comprehensive tests for GPT-5 model handling Fixes #6979
@@ -107,16 +107,26 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||
// Required by some providers; others default to max tokens allowed | |||
let maxTokens: number | undefined = info.maxTokens ?? undefined | |||
|
|||
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | |||
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting the GPT-5 model detection logic into a shared helper function. This logic (using modelId.toLowerCase().includes('gpt-5') || modelId.toLowerCase().includes('gpt5')) appears in both createMessage and completePrompt, and centralizing it would improve maintainability.
This comment was generated because it violated a code review rule: irule_tTqpIuNs8DV0QFGj.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing my own code because apparently I trust no one, not even myself.
@@ -107,16 +107,26 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||
// Required by some providers; others default to max tokens allowed | |||
let maxTokens: number | undefined = info.maxTokens ?? undefined | |||
|
|||
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | |||
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model detection logic could be more precise. Currently, modelId.toLowerCase().includes("gpt-5")
would match unintended models like "not-gpt-5000". Consider using a more specific pattern:
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") | |
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | |
const modelLower = modelId.toLowerCase() | |
const isGPT5Model = modelLower.startsWith("gpt-5") || modelLower.startsWith("gpt5") || modelLower === "gpt5" |
@@ -179,6 +189,9 @@ export class LiteLLMHandler extends RouterProvider implements SingleCompletionHa | |||
async completePrompt(prompt: string): Promise<string> { | |||
const { id: modelId, info } = await this.fetchModel() | |||
|
|||
// Check if this is a GPT-5 model that requires max_completion_tokens instead of max_tokens | |||
const isGPT5Model = modelId.toLowerCase().includes("gpt-5") || modelId.toLowerCase().includes("gpt5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detection logic is duplicated from line 111. Would it be cleaner to extract this into a helper method to maintain DRY principles? Something like:
private isGPT5Model(modelId: string): boolean {
const modelLower = modelId.toLowerCase()
return modelLower.startsWith("gpt-5") || modelLower.startsWith("gpt5") || modelLower === "gpt5"
}
messages: [systemMessage, ...enhancedMessages], | ||
stream: true, | ||
stream_options: { | ||
include_usage: true, | ||
}, | ||
} | ||
|
||
// GPT-5 models require max_completion_tokens instead of the deprecated max_tokens parameter | ||
if (isGPT5Model && maxTokens) { | ||
// @ts-ignore - max_completion_tokens is not in the OpenAI types yet but is supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid using @ts-ignore
here? Could we extend the OpenAI types or create a custom interface that includes max_completion_tokens
to maintain type safety? For example:
interface GPT5RequestOptions extends Omit<OpenAI.Chat.Completions.ChatCompletionCreateParamsStreaming, 'max_tokens'> {
max_completion_tokens?: number
max_tokens?: never
}
}) | ||
|
||
it("should use max_completion_tokens for various GPT-5 model variations", async () => { | ||
const gpt5Variations = ["gpt-5", "gpt5", "GPT-5", "gpt-5-turbo", "gpt5-preview"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great test coverage! Consider adding edge cases like mixed case variations ("GpT-5", "gPt5") or models with additional suffixes ("gpt-5-32k", "gpt-5-vision") to ensure the detection works correctly for all possible GPT-5 model names.
messages: [systemMessage, ...enhancedMessages], | ||
stream: true, | ||
stream_options: { | ||
include_usage: true, | ||
}, | ||
} | ||
|
||
// GPT-5 models require max_completion_tokens instead of the deprecated max_tokens parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment explaining why GPT-5 models require this special handling, perhaps with a link to relevant Azure/OpenAI documentation. This would help future maintainers (including future me) understand the context behind this workaround.
This can be merged after #7067 since this PR will update the OpenAI SDK, allowing us to use it without the type error suppression comments. |
This PR fixes the issue where GPT-5 models fail with LiteLLM due to using the deprecated
max_tokens
parameter instead ofmax_completion_tokens
.Problem
When using GPT-5 models with LiteLLM, users encounter the error:
Solution
createMessage
andcompletePrompt
methods to usemax_completion_tokens
for GPT-5 modelsmax_tokens
for non-GPT-5 modelsTesting
max_tokens
as beforeFixes #6979
Important
Fixes GPT-5 model handling in LiteLLM by using
max_completion_tokens
instead ofmax_tokens
, with comprehensive test coverage.LiteLLMHandler
by usingmax_completion_tokens
instead of deprecatedmax_tokens
.createMessage
andcompletePrompt
methods to handle GPT-5 models.max_tokens
.lite-llm.spec.ts
for GPT-5 model handling.max_tokens
.lite-llm.spec.ts
for model variations.This description was created by
for c04e019. You can customize this summary. It will automatically update as commits are pushed.